if (5 > 0) {
print("We're inside the if statement")
}
[1] "We're inside the if statement"
July 2, 2025
In this workshop we cover the building blocks for developing more complex code, looking at
In the first half of this session we’ll look at two types of control flows: conditionals and loops.
Conditionals allow you to put “gates” in your code, only running sections if a certain condition is true. They are common to most programming languages.
In R, they are called if
statements, because you use the if
command. For example,
The line print("We're inside the if statement")
will only run if 5 > 0
is true. If not, it’ll get skipped.
Curly brackets are essential. Only code inside them will be governed by conditional
[1] "We're inside the if statement"
[1] "This code always runs"
Watch what happens if we change the condition
[1] "This code always runs"
Now, the first line doesn’t run. That’s the essence of a conditional.
There’s not much point to using a condition that will always be true. Typically, you’d use a variable in the condition, for example.
Here is a table of the different operators you can make conditions with. When you run them, they always return either True
or False
.
Operator | True example | Description |
---|---|---|
== |
10 == 10 |
Same value and type |
!= |
"10" != 10 |
Different value or type |
> |
10 > 5 |
Greater than |
>= |
10 >= 10 |
Greater than or equal to |
< |
5 < 10 |
Less than |
<= |
5 <= 10 |
Less than or equal to |
&& |
10 == 10 && "apple" == "apple" |
Only true if both conditions are true. |
|| |
10 == 10 || "a" == "b" |
Always true if one condition is true. |
elif
and else
if
statements only run if the condition is True. What happens if its False? That’s what the else
command is for, it’s like a net that catches anything that slipped past if
:
pet_age <- 5
if (pet_age > 10) {
print("My pet is older than 10")
} else {
print("My pet is 10 or younger")
}
[1] "My pet is 10 or younger"
else
also needs curly brackets!
Check what happens when you change the age from 5 to 15.
Finally, what if you wanted to check another condition only if the first one fails? That’s what else if
is for. It’s another if statement but it only runs if the first fails.
pet_age = 5
if (pet_age > 10) {
print("My pet is older than 10")
} else if (pet_age < 5) {
print("My pet is younger than 5")
} else {
print("My pet is 10 or younger")
}
[1] "My pet is 10 or younger"
You can include as many as you’d like
Sometimes you need to repeat a task multiple times. Sometimes hundreds. Maybe you need to loop through 1 million pieces of data. Not fun.
R’s loops offer us a way to run a section of code multiple times. There are two types: for
loops, which run the code once for each element in a sequence (like a list or string), and while
loops, which run until some condition is false.
while
loopsThese are almost the same as if
statements, except for the fact that they run the code multiple times. Let’s begin with a basic conditional
The
paste
function lets you print multiple things together
What if we wanted to check all the numbers between 5 and 10? We can use a while loop.
[1] "1 is less than 10."
[1] "2 is less than 10."
[1] "3 is less than 10."
[1] "4 is less than 10."
We need to include
paste
inside
We’ve done two things
if
with while
number = number + 1
to increase the number each time.Without step 2, we’d have an infinite loop – one that never stops, because the condition would always be true!
While loops are useful for repeating code an indeterminate number of times.
for
loopsRealistically, you’re most likely to use a for loop. They’re inherently safer (you can’t have an infinite loop) and often handier.
In R, for
loops iterate through a sequence, like the objects in a list. This is more like other languages’ foreach
, than most’s for
.
Let’s say you have a vector of different fruit
and you want to run a section of code on "apple"
, then "banana"
, then "cherry"
. Maybe you want to know which ones have the letter “a”. We can start with a for
loop
[1] "apple"
[1] "banana"
[1] "cherry"
This loop’s job is to print out the variable fruit
. But where is fruit
defined? Well, the for
loop runs print(fruit)
for every element of list_of_fruits
, storing the current element in the variable fruit
. If we were to write it out explicitly, it would look like
character(0)
[1] "apple"
[1] "banana"
Let’s return to our goal: working out which ones have an “a”. We need to put a conditional inside the loop:
list_of_fruits <- c("apple", "banana", "cherry")
for (fruit in list_of_fruits) {
if (grepl("a", fruit)) {
print(paste("a is in", fruit))
}
else {
print(paste("a is not in", fruit))
}
}
[1] "a is in apple"
[1] "a is in banana"
[1] "a is not in cherry"
Finally, it’s often convenient to loop through a list of numbers. R makes this easy with the x:y
notation:
contains all the integers between \(1\) and \(10\). To loop through each,
The advantage of this approach is that we can loop through many numbers:
This can be useful if you need to loop through multiple objects by indexing. We’ll spare you the output here.
Consider the follow situation. You have a dataset, and want to apply a function to every column. Or maybe some columns. What to do?
You could loop over them with a for loop. Alternatively, you could use the mapping functions in purrr, which simplifies the code.
What is a map? Generally, a map takes something and makes it something else. So far, that’s the same a function. The difference is that a map takes lots of things and translates them all in the same way. For example, a geographical map takes life-sized locations and transforms them all in the same way to a hand-sized piece of paper.
Essentially, maps are a way of transforming a selection of variables in the same way. We’ll start by brining in the purrr
library
Let’s use the same data as in the statistics session:
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
What if you want the median value from all columns? We can use the map_dbl()
function to map doubles (long decimal numbers):
Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
argument is not numeric or logical: returning NA
Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
argument is not numeric or logical: returning NA
Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
argument is not numeric or logical: returning NA
Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
argument is not numeric or logical: returning NA
Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
argument is not numeric or logical: returning NA
name birth_date height_cm positions nationality age
NA NA 183 NA NA 25
club
NA
Don’t worry about the warnings - they’re just there because you can’t take the median of a non-numeric variable. To check which ones are, we can map the logical operator is.numeric
:
name birth_date height_cm positions nationality age
FALSE FALSE TRUE FALSE FALSE TRUE
club
FALSE
We can use the pipe here,
name birth_date height_cm positions nationality age
FALSE FALSE TRUE FALSE FALSE TRUE
club
FALSE
Let’s select the numeric columns and look at the medians again
We can also create custom functions. We use .x
to refer to the variable:
We’ll wrap this session up by looking at custom functions. So far, we’ve only used built-in functions or those from other people’s modules. But we can make our own!
We’ve only ever called functions - this is what we do when we use them. All functions need a definition, this is the code that gets run when they’re called.
Functions are machines. They take some inputs, run some code with those inputs, and spit out one output. We need to define how they work before we use them. We should specify
We include these in three steps
return
statement, specifying the outputFor example, let’s create a function that converts centimetres to metres.
Taking it apart, we have
cm_to_m
value_in_cm
value_in_cm / 100
Importantly, nothing appears when you run this code. Why? Because you’ve only defined the function, you haven’t used it yet.
To use this function, we need to call it. Let’s convert \(10\text{ cm}\) to \(\text{m}\).
When we call the function, it runs with value_in_cm <- 10
.
That’s it! Every function that you use, built-in or imported, looks like this.
Because functions must be defined before called, and defining them produces no output, best practice is to place functions at the top of your script, below the import statements.
One quirk of R functions is that, by default, they return the output of the line. Let’s add a new line that prints the message “\(x\text{ cm} = y\text{ m}\)”. We’ll need to also save our calculation in the process:
cm_to_m <- function(value_in_cm) {
value_in_m <- value_in_cm / 100
print(paste(value_in_cm, "cm =", value_in_m, "m"))
}
cm_to_m(10)
[1] "10 cm = 0.1 m"
It works, but we have a problem. The output of the function is the whole message, not the value. The easiest way to fix this is to call the output on the last line:
cm_to_m <- function(value_in_cm) {
value_in_m <- value_in_cm / 100
print(paste(value_in_cm, "cm =", value_in_m, "m"))
value_in_m
}
cm_to_m(10)
[1] "10 cm = 0.1 m"
[1] 0.1
Alternatively, you can use the return()
function to exit before the end and manually specify the output.